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Recovery of sparse signals from compressed measurements constitutes an norm minimization problem, which is unpractical 
to solve. A number of sparse recovery approaches have appeared in the literature, including {\ minimization techniques, greedy 
.pursuit algorithms, Bayesian methods and nonconvex optimization techniques among others. This manuscript introduces a novel 
"two-stage greedy approach, called the Forward-Backward Pursuit (FBP). FBP is an iterative approach where each iteration consists 
.of consecutive forward and backward stages. The forward step first expands the support estimate by the forward step size, while the 
ps) "following backward step shrinks it by the backward step size. The forward step size is higher than the backward step size, hence the 
.initially empty support estimate is expanded at the end of each iteration. Forward and backward steps are iterated until the residual 
"power of the observation vector falls below a threshold. This structure of FBP does not necessitate the sparsity level to be known 
\m) .a priori in contrast to the Subspace Pursuit or Compressive Sampling Matching Pursuit algorithms. FBP recovery performance is 
"demonstrated via simulations including recovery of random sparse signals with different nonzero coefficient distributions in noisy 
C\| .and noise-free scenarios in addition to the recovery of a sparse image. 

Keywords: compressed sensing, forward-backward search, sparse signal reconstruction, greedy algorithms, two-stage 
thresholding 
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I— 1.1. Introduction 

] Despite the conventional acquisition process which captures 
-a signal as a whole prior to dimensionality reduction via trans- 
"form coding. Compressed Sensing (CS) aims at acquisition of 

\^ -sparse or compressible signals directly in reduced dimensions. 

lO ^Mathematically, the "compressed" observations are obtained 
-via an observation matrix O 



y = Ox, 



(1) 



X 
S3 



[where x is a /iT-sparse signal of length A^, K is the number of 
■nonzero elements in x, y is the observation vector of length 
'and O is a MxN random matrix with K < M < N. Once y is 
■observed, the CS is to recover x, which is analytically ill-posed 
■following the dimensionality reduction via O. Exploiting the 
sparse nature of x, CS reformulates ([TJ as a sparsity-promoting 
optimization problem 



X = arg min ||x||o s.t. y = €>x. 



(2) 



where ||x||o, called the {q norm by abuse of terminology, de- 
notes the number of nonzero elements in x. As direct solu- 
tion of ^ is computationally intractable, a number of alterna- 
tive and approximate solutions have emerged in the Uterature. 
An overview of mainstream methods is available in which 
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broadly categorizes CS algorithms as convex relaxation tech- 
niques, greedy pursuits, Bayesian methods and nonconvex op- 
timization techniques. Theoretical exact recovery guarantees 
have also been developed mainly under the Restricted Isome- 
try Property (RIP) |2, 3, 4J for some of the algorithms. RIP also 
provides a basis for understanding what type of observation ma- 
trices should be employed. Random matrices with Gaussian or 
Bernoulli entries, or matrices randomly selected from the dis- 
crete Fourier transform satisfy RIP with high probabilities. 

Convex relaxation methods H 0, □> S Istl replace the 
{q minimization in (|2]i with its closest convex approximation, 
the i\ minimization. Following this modification, recovery is 
tractable via convex optimization algorithms such as linear pro- 
gramming, as proposed by Basis Pursuit (BP) |8i], which is 
historically the first convex relaxation algorithm. Greedy pur- 
suit algorithms, such as Matching Pursuit (MP) fl^. Orthog- 
onal MP (OMP) iH, Compressive Sampling MP (CoSaMP) 
lH, Subspace Pursuit (SP) and Iterative Hard Threshold- 
ing (IHT) il4lll5[l . employ iterative mechanisms. Among these, 
MP and OMP build up the support of x iteratively, adding one 
element per iteration. SP and CoSaMP, on the other hand, ap- 
ply a two-stage iterative scheme, which at each iteration first 
expands and then shrinks the support by the same number of el- 
ements, keeping the support size fixed between iterations. IHT 
combines gradient descent with a thresholding step. j3 pro- 
vides an algorithmic framework called Two-Stage Thresholding 
(TST), into which algorithms such as SP, CoSaMP and IHT fall. 

The manuscript at hand proposes a two-stage iterative greedy 
algorithm, called the Forward-Backward Pursuit (FBP). As the 
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name indicates, FBP employs forward selection and backward 
removal steps which iteratively expand and shrink the support 
estimate of x. Though this structure is close to SP or CoSaMP, 
FBP has a major difference from these: Forward and backward 
step sizes of FBP are not the same. Forward step size is higher 
than the backward step size, hence the support estimate is itera- 
tively expanded. With this structure, FBP falls into the general 
category of TST-type algorithms, while iterative expansion of 
the support estimate is investigated for the first time in this con- 
cept. The recovery abilities of FBP are demonstrated on sparse 
signals with different nonzero coefficient distributions in noise- 
less and noisy observation scenarios, and on a sparse image in 
comparison to BP, SP and OMP. These results indicate that FBP 
can perform better than SP and BP in most scenarios, which in- 
dicates that SP is not necessarily the globally optimum TST 
scheme as proposed in lfl6ll . A preliminary version of this work 
has been presented at EUSIPCO'2012. 

Its two-stage structure with different forward and backward 
step sizes gives FBP some advantages over both SP and OMP. 
SP requires a priori estimate of the sparsity level, K, which is 
most of the time not available in practice. On the other hand, 
FBP expands the support estimate from scratch until the resid- 
ual error of the observation is small enough (or vanishes for 
the noiseless case). Hence, FBP does not require K a priori in 
contrast to SP and CoSaMP. Additionally, the backward step of 
FBP can remove some possibly misplaced atoms from the sup- 
port estimate of the support, which is an advantage over forward 
greedy algorithms such as OMP. 

A forward-backward greedy approach for the sparse learn- 
ing problem, FoBa, has been investigated in |jT3l- Though both 
FoBa and FBP consist of iterative forward and backward steps, 
the algorithms have some fundamental differences: The FoBa 
algorithm employs strict forward and backward step sizes of 

1, while the forward and backward step sizes of FBP are typi- 
cally greater than 1 . As a result of the larger step sizes (actually 
the difference between forward and backward step sizes), FBP 
terminates in less iterations. In addition, FoBa takes the back- 
ward step after a few forward steps based on an adaptive deci- 
sion. FBP employs no adaptive criterion for taking the back- 
ward step (note that this is not trivial when the backward step 
size is greater than 1). Finally, FoBa has been applied for the 
sparse learning problem, while we propose and evaluate FBP 
for sparse signal recovery from compressed measurements. 

This manuscript is organized as follows: First, we give a 
brief overview of greedy pursuit algorithms. The FBP algo- 
rithm is introduced in Section |3] Section |4] is devoted to the 
demonstration of FBP recovery performance in comparison to 
the BP, SP and OMP algorithms. We conclude with a brief sum- 
mary of our findings in Section |5] 

2. Greedy Pursuits 

In this section we summarize OMP, SP and TST, which are 
important for our purposes because of their resemblance to the 
FBP algorithm. Beforehand, we define the notation that is used 
throughout the paper: Let 5**' be the estimated support of x 



after the A;th iteration, while S ^''^ stands for the expanded sup- 
port after the forward selection step of the kth iteration, y^*-* 
denotes the approximation of y after the kth iteration and r^*^^ is 
the residue of y after the kth iteration. is the matrix com- 
posed of the columns of O indexed by S . Similarly, Xs is the 
vector consisting of the entries of x with indices in S . 

OMP is a forward greedy algorithm that searches for the 
support of X by identifying one element per iteration. It starts 
with an empty support estimate: 5'*"' - and r'"* = y. At 
the iteration k, OMP expands 5** the support estimate of the 
previous iteration, with the index of the dictionary atom closest 
to r**^"'*, the residue of the last iteration (i.e. it selects the index 
of the largest magnitude entry of O^r^*"'*). Following this, y^*^-* 
is computed via orthogonal projection of y onto O^it) and the 



residue is updated as r' 



■(k) 



y - y 



(k) 



The iterations are carried 



out until the termination criterion is met. 

In this work, we stop OMP when ||r*'^'||2 falls below the 
threshold fiHylb. A similar termination criterion is also defined 
in the next section for FBP. Another termination criterion com- 
monly used for OMP is to run it for exactly K steps. Though, 
the choice for the termination criterion is mostly underestimated 
in literature, recent work of the authors ifisll indicates that run- 
ning for exactly K steps is suboptimal. Moreover, specifying K 
in practice is most of the time not trivial. On the other hand, 
termination based on the residual power of the observed vector 
indeed resembles the actual aim of exactly representing the ob- 
servation, and improves the recovery of OMP-type algorithms 
as the recovery results in 11811 suggest. As a result of employing 
the residue-based termination criterion, OMP recovery results 
in this work are comparable to, and for some cases even better 
than BP and SP. llST provides a comparison of the two termi- 
nation criterion for OMP-type algorithms. In the light of these 
results, the authors feel that the more optimal way of running 
OMP should involve a residue-based termination criterion. 

SP and CoSaMP combine selection of multiple columns per 
iteration with a pruning step, maintaining K element support 
sets throughout the iterations. At iteration k, SP first expands 
the selected support with indices of the K largest magnitude 
entries of O^r^*"'*), obtaining an extended support 5**^' of size 
2K (Alternatively, CoSaMP expands the support by 2K ele- 
ments.). In the second step, the orthogonal projection coeffi- 
cients of y onto O^(i) are computed, and the estimated support 
S is obtained by pruning § to contain only the indices of the 
K largest magnitude projection coefficients, r**^^ is finally com- 
puted using the approximation y**' which is obtained by orthog- 
onal projection of y onto O^ki. The iterations are stopped when 
||r**^'||2 > ||r^*"''||2- Once stopped at iteration /, the final estimate 
of the support is 5*' '\ while orthogonal projection of y onto 
O^i/ i) yields the corresponding nonzero values. CoSaMP and 
SP are provided with RIP-based theoretical guarantees, show- 
ing that these two-stage schemes reduce the reconstruction error 
iteratively when some certain RIP conditions are met. 

Recently, Maleki and Donoho have presented an algorith- 
mic framework called Two-Stage Thresholding (TST) |16|], into 
which algorithms such as SP and CoSaMP fall. As the name 
suggests, this framework employs a two-stage iterative scheme. 
The first stage is similar to iterative thresholding algorithms: It 
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first updates the sparse estimate in the direction of <D^r'' 
then thresholds the result to get a new support estimate. The 
optimal coefficients are computed by orthogonal projection of 
y onto the selected support. A second thresholding operator on 
these optimal coefficients yields the sparse estimate of the cor- 
responding iteration. The evaluation of TST iterative thresh- 



olding algorithms in Ill6ll announces an optimum TST version, 
which finally turns out to be the SP algorithm. Following this 
result, we involve SP in our simulations as a representative of 
TST schemes. Our results below also support the conclusion in 
lfl6ll that SP is the best TST scheme for sparse signals with con- 
stant amplitudes, and hence it is the best worst-case performer. 
In addition, our results also show that SP is not globally the best 
TST algorithm, as FBP outperforms SP for some other distribu- 
tions. 

3. Forward-Backward Pursuit 

Forward greedy algorithms, such as OMP and other MP 
variants, enlarge the support estimate iteratively via forward se- 
lection steps. In such schemes, any element that is inserted into 
the support cannot be removed. That an incorrect element re- 
mains in the support until termination may cause the recovery 
to fail. To illustrate, consider a well-known example: Let x 
be the summation of two equal magnitude sinusoids with very 
close frequencies, f\ and /2, and O be an overcomplete sinu- 
soidal dictionary, containing atoms with frequencies f\ , /2 and 
/s = (/i+/2)/2 among others. The first iteration of OMP se- 
lects the component with frequency /3 . Then, during the next 
iterations, the algorithm tries to cover for this error by choosing 
components other than the two correct ones and fails. 

SP and CoSaMP, on the other hand, are two-stage algo- 
rithms, which are based on iterative expansion and shrinkage of 
the support. This allows both addition and removal of nonzero 
indices to the support estimate. However, the support size of 
these algorithms is fixed between iterations, and they require 
an estimate of the sparsity level K a priori. This is an impor- 
tant handicap in most practical cases, where K is unknown or it 
is not desired to fix it. In III6II . Maleki and Donoho propose a 
tuned SP algorithm, where they suggest a fixed support size for 
a given {M, N} pair. In this approach, the support size is tuned 
based on the sparsity ratio KjM corresponding to the 50% ex- 
act recovery rate for the related MjN value. This motivation 
behind this choice is to select the support size as large as the 
sparsity level SP can exactly recover. As the optimum KjM 
rates are pre-computed, support size can be decided once M 
and are provided. This tuned SP algorithm turns out to be 
optimal TST scheme according to the results in |fl6ll. However, 
these results also indicate that choosing the support size of SP 
larger than the actual sparsity level degrades the recovery per- 
formance. Hence, though this tuned SP may be the optimum 
practical SP algorithm, its performance is still worse than that 
of the SP version that incorporates the actual sparsity level. 

The Forward-Backward Pursuit algorithm provides an itera- 
tive two-stage scheme in which no a priori estimate of the spar- 
sity level K is required. The first stage of FBP is the forward 
step which expands the support estimate by a atoms, where we 



Algorithm 1 FORWARD-BACKWARD PURSUIT 
Input: 3), y 
Define: a, yS, /Tmax, s 
Initialize: = 0, r(°' =y 
A: = 

while true do 
k^k+\ 

Tf - {indices corresponding to a largest magnitude ele- 
ments in O^r**"") 

w = arg min ||y - O^wzlb 

z 

Th = {indices corresponding to smallest magnitude el- 
ements in w} 

w = arg min ||y - 05(i-iz||2 

z 

1-W = y - O^ii-iW 

if llrW|l2<£||yll2or|5«|>/i:™axthen 

break 
end if 
end while 

X^,t) - w 

X(l,2 N}\S":' - 

return x 



call a the forward step size. The second stage, then removes 
j6 atoms from the support. Similar to a, fi is referred to as the 
backward step size. The forward step size is chosen larger than 
the backward step size, hence the support estimate is enlarged 
by ff - j6 atoms at each iteration. The forward and backward 
steps are iterated until the residual power of the observation 
vector either vanishes or is less than a threshold value, which is 
proportional to the real power of the observation vector 

The FBP algorithm can be outlined as follows: We initial- 
ize the support estimate as an empty set: 5"^' - 0, and the 
residue as r'"* - y. At iteration k, first the forward step expands 
^(i-i) by indices of a largest magnitude elements in O^r*^'"'-*. 
This builds up the extended support 5'^*. Then the projection 
coefficients are computed by orthogonal projection of y onto 
O^H:). The backward step prunes 5*^*' by removing the indices 
of p smallest magnitude projection coefficients, which produces 
the support estimate 5**'. Finally, y**^' is computed via orthog- 
onal projection of y onto Ojd) and the residue is updated as 



y - y*^*'. The iterations are carried on until either 



rWll 



vanishes or falls below the threshold e||y||2. In practice, we 
choose E very small (on the order of 10"^ for the experiments 
in this paper) when the observations are noise-free. For noisy 
observations, e should be selected depending on the noise level. 
To avoid the algorithm running for too many iterations in case 
of a failure, the maximum number of iterations is set as Kj^^x- 
After termination at iteration I, S gives the nonzero indices of 
the estimate x, and the corresponding nonzero elements are the 
projection coefficients of y onto 5*". The pseudo-code of FBP 
is given in Algorithm 1 . 

Unlike the subspace methods SP and CoSaMP, FBP does 
not require an a priori estimate of the sparsity level K. As ex- 
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plained above, FBP enlarges the support estimate by ff - yS at 
each iteration until termination of the algorithm which is based 
on the residual power instead of the sparsity level. Hence, nei- 
ther the backward step nor the termination criterion require an 
estimate of the sparsity level K. This, however, comes at a cost: 
The theoretical guarantees cannot be provided in a way similar 
to SP or CoSaMP, which make use of the support size being 
fixed as K after the backward step. For the time being, we can- 
not provide a complete theoretical analysis of FBP, and leave 
this as a future work. Note that, however, most of the theoreti- 
cal analysis steps of SP or CoSaMP also hold for FBP. 

Finally, let's go back to the example with the three sinusoids 
in order to demonstrate the advantages of using FBP instead 
of OMP. Remember that OMP fails for this example. Instead, 
assume we run FBP with a - 3 and /3 - I. During the forward 
step of the first iteration, FBP selects all the three components 
with frequencies /i , /2 and . Following orthogonal projection, 
the backward step will eliminate fj,, and the recovery will be 
successful after the first iteration. 

4. Experimental Evaluation 

This section is reserved for demonstration of the FBP recov- 
ery performance in comparison to BP, SP, OMP. For this pur- 
pose, we run recovery simulations involving different nonzero 
coefficient distributions, noiseless and noisy observations, and 
a sparse image. First, we compare exact recovery rates, average 
recovery error and run times of FBP with those of OMP, SP and 
BP for signals with nonzero elements drawn from the Gaussian 
distribution. In order to generalize the results to a wide range 
of M and K, we provide the empirical phase transition curves, 
which are obtained using the procedure in (l^ . for different 
distributions. Meanwhile, phase transition curves also serve for 
the purpose of investigating optimal a and /3 choices. We then 
demonstrate recovery from noisy observations, and finally test 
our proposal on an image to illustrate the recovery performance 
on a realistic coefficient distribution. 

4.1. Exact Recovery Rates and Reconstruction Error 

First, we compare the exact recovery rates, recovery error 
and run times of FBP using various a and /3 values with those 
of OMP, SP and BP. In these simulations, the signal and ob- 
servation sizes are fixed as N = 256 and M = 100, while K 
varies in [10,45]. For each /iT, recovery simulations are repeated 
over 500 randomly generated samples where the nonzero en- 
tries are drawn from the standard Gaussian distribution. From 
now on, we call this type of sparse signals the Gaussian sparse 
signals. For each test signal, a different observation matrix is 
drawn from the Gaussian distribution with mean and stan- 
dard deviation s is set to 9x10"^ and /iTmax is 55. Note 
that, e and K^^^ are also valid for OMP as OMP is also run 
until ||r||2 < ellylb, with a maximum of K^^^ iterations allowed. 
The recovery error is expressed in terms of Average Normalized 
Mean-Squared-Error (ANMSE), which is defined as 



where x, is the recovery of the /th test vector x, . 

Fig. [T] and |2] depict the reconstruction performance of FBP 
with various a - /3 choices for the Gaussian sparse signals in 
comparison to OMP, BP and SP. Fig. [T]is obtained by varying 
a in [2,30], while /3 - a - I. That is, the forward step size 
varies, while the support estimate is expanded by one element 
per iteration. For Fig.|2] a is selected as 20, and /3 is varied in 
[13,19], changing the increment in the support size per iteration 
for a fixed forward step size. The speeds of the FBP, SP and 
OMP algorithms are also compared, while BP is excluded as 
it is incomparably slower than the other algorithms. Roughly, 
these recovery results indicate that FBP shows better recovery 
than the other candidates in this example. 

According to Fig.[Tl increasing a while keeping the support 
increment, i.e. a - /3, fixed improves the recovery performance 
of FBP. We observe that the exact recovery rate of FBP is signif- 
icantly better than the other candidates. BP, SP and OMP start 
to fail at around K = 25, while, for a > 20, the FBP failures 
begin only when K > 30. As for the ANMSE, FBP is the best 
performer when a > 20. With this setting, BP can beat FBP in 
ANMSE only when K ^ 45. 

In Fig. 121 we observe that increasing j6 for a fixed a also im- 
proves the recovery performance. In this case, exact recovery 
rates significantly increase with the backward step size, while 
the ANMSE values remain mostly unaltered. This indicates 
that when /3 is increased, nonzero elements with smaller magni- 
tudes, which do not significantly change the recovery error, can 
be better recovered. In comparison to the other algorithms in- 
volved in the simulation, FBP is the best performer for /? > 15. 
As for the previous figure, BP can produce lower ANMSE than 
FBP only for/: = 45. 

As for the run times, we expectedly observe that increasing 
a or f3 slows down FBP, as these increase the dimensions of the 
orthogonal projection operations after the forward or backward 
steps. On the other hand, increasing a-yS decreases the number 
of necessary iterations. As a result, FBP terminates faster More 
important for this example, when o- = 20 and /? < or - 2, the run 
time of FBP, SP and OMP are very close. In case o- = 20 and 
fi - 11, the speed of FBP and OMP are almost the same, while 
the reconstruction performance of FBP is significantly better 
than the other algorithms involved. Note that the speed of FBP 
can be improved by removing the orthogonal projection after 
the backward step, at the expense of a slight degradation in the 
recovery performance. 

Finally, we observe that OMP performs better than SP in 
this example. It yields both higher exact recovery rate and lower 
ANMSE than SP does. This is due to the chosen OMP termi- 
nation criterion ||r||2 < ellylb- As discussed in Section|2l such 
a termination criterion provides better recovery results than the 
OMP scheme that runs for exactly K iterations. Moreover, an 
estimate of K is not required anymore. A comparison of the 
OMP recovery performance with the two termination criteria 
can be found in lIlSll . 
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4.2. Phase Transitions 

Phase transitions are important for empirical evaluation of 
CS recovery algorithms over a wide range of sparsity levels and 




Figure 1: Reconstruction results over sparsity for Gaussian sparse vectors. For FBP, /? = a - I. 
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Figure 2; Reconstruction results over sparsity for Gaussian sparse vectors. For FBP, a = 20. 



observation lengths. These simulations involve three different 
nonzero element distributions. First one of them is the uniform 
sparse signals where the nonzero elements are distributed uni- 
formly in [-1, 1]. The second type is the Gaussian sparse sig- 
nals investigated above. The last ensemble involved is the Con- 
stant Amplitude Random Sign (CARS) sparse signals (follow- 
ing the naming in iflill ) where nonzero elements have constant 
magnitude 1 with random signs. Below, we first depict phase 
transitions of FBP with different a and /3 choices for uniform 
sparse signals in order to investigate optimality of these over the 
observation length. These simulations give us a feeling about 
how to choose FBP step size in relation to M and A^. Next, we 
compare FBP with a fixed setting to BP, OMP and SP for all 
three test sets. 

To explain how we obtain the phase transitions, let's first 
define normalized measures for observation length and sparsity 



level: A - M/N and p — K/M. To obtain the phase transi- 
tion curves, we keep the signal length fixed at = 250, and 
alter M and K to sample the {A,p] space for A e [0.1,0.9] 
and p e (0, 1]. For each {A,p] tuple, we randomly generate 
200 sparse instances and run FBP, OMP, BP and SP algorithms 
for recovery. For each test example, we employ an individ- 
ual Gaussian observation matrix as above. We set, e = 10"^ 
and /iTmax = M. Specifying the exact recovery condition as 
||x - x||2 < 10"~||x||2, exact recovery rate is obtained for each 
{A,p] tuple and each algorithm. The phase transitions are then 
obtained using the methodology described in lfl6ll . That is, for 
each A, we employ a generalized linear model with logistic link 
to describe the exact recovery curve over p, and then find the p 
value for 50% exact recovery probability. 

Phase transitions provide us important means for finding an 
empirical way of choosing a and j6. As discussed in llT6ll . the 
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Figure 3: Phase transition of FBP with different forward and backward step sizes for uniform sparse signals. 
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Figure 4: Phase transitions of FBP, BP, SP and OMP for Gaussian, uniform and CARS sparse signals. 



phase transition is mostly a function of the X. Moreover, the 
transition region turns out to be narrower with increasing X. 
These are also supported by some other work in the literature, 
|21L 12211 . Hence, in order to find an optimal set of step 
sizes for FBP, we need to have a look at the phase transitions 
using different a and ji parameters. Intuitively, a and /? should 
not be fixed but proportional to M. Trying to find fixed a and 
values is subject to fail mainly for very low or very high X val- 
ues. In other words, it would not be possible an optimal single 
set {a,/?} for the whole X range. Hence, we suggest selecting a 
proportional to M, and fi proportional to a. In order to evaluate 
these, we run two distinct sets of simulations; First, we vary a 
between O.IM and 0.4M, while ji - a - \. Then we fix alpha 
as 0.2M and vary /? between 0.9q' and 0.7q' in order to see how 
fast the performance degrades with yS. The corresponding phase 
transitions are depicted in Fig.|3] 

On the left-side of Fig. [3] we observe that phase transi- 



tions are stably improved with a until a - 0.3M. Choosing 
a - 0.4M, in contrast, improves the phase transition for the 
mid-/l region, while the results become worse for low and high 
X values. We do not increment a further, however note that do- 
ing so would narrow the mid-/l range where recovery is slightly 
improved, and widen the low and high X regions where the per- 
formance is degraded. Hence, we suggest using a - 0.3M 
for a globally optimum FBP scheme, while this may be in- 
creased if the recovery problem is in the mid-/l region. Note 
however that the complexity of FBP increases with a, and se- 
lecting a - 0.2M may already lead to better phase transition 
than OMP, BP and SP as demonstrated below. 

The graph on the right side of Fig. |3]depicts the phase tran- 
sitions for different y6/a ratios. It is obvious that recovery ac- 
curacy degrades with decreasing /3, however we are interested 
in observing how fast this occurs. The results state that the 
decrease is slight for low X range. This degradation increases 
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slightly with A. The phase transitions for other algorithms are 
not plotted on this figure, but are available below. In compar- 
ison to these, even FBP with /3 between 0.7a and O-Sff pro- 
vides a better phase transition curve on general, while BP can 
do slightly better than this only for A around 0.8-0.9. Note that 
/3/a ratio commands the increment in the support per iteration. 
Reducing this ratio decreases the number of FBP iterations and 
hence the total run time of FBP. Therefore, the results tell us 
that it is possible to reduce the complexity of FBP until fi is 
around 0.7 to 0.8 times a, while the phase transition curve is 
still better than BP, SP and OMP 

Fig. |4] compares the phase transition curve of FBP to those 
of OMP, BP and SP for Gaussian, uniform and CARS sparse 
signals, where FBP step sizes are fixed as a = Q.2M and /3 = 
a-\. For Gaussian and uniform distributions, FBP outperforms 
the other algorithms, while for the CARS case BP is better than 
FBP, and the other greedy algorithms. As a consequence of its 
strong theoretical guarantees and convex structure, the phase 
transition of BP is robust to the coefficient distribution. Note 
also that these empirical BP curves are very close to their the- 
oretical counterpart though we do not explicitly plot the theo- 
retical one. SP, FBP and OMP performances degrade for the 
CARS case, while FBP and OMP curves show the highest vari- 
ation among different distributions. We observe that when the 
nonzero values cover a wide range, as for the Gaussian distri- 
bution, the performances of FBP and OMP are boosted. On the 
other hand, nonzero values of equal magnitudes are the most 
challenging case for these. These observations indicate that 
these algorithms are more effective when the nonzero elements 
span a wide range of magnitudes and when this range gets wide 
enough, even OMP can possess better phase transition than BP. 

4.3. Recovery from Noisy Observations 

Next, we simulate recovery of sparse signals from noise- 
contaminated observations. For this purpose, the noisy observa- 
tion vectors are obtained via contamination by white Gaussian 
noise at signal-to-noise ratios (SNR) varying from 5 to 40 dB. 
In this simulation, FBP is run with a = 20 and fi - 17, as we 
have seen above that OMP and FBP require similar run times 
for this choice, e is selected with respect to the noise level, such 
that the remaining residual power is equal to the noise power. 
The simulation is repeated for 500 Gaussian and 500 uniform 
sparse signals, where - 256 and M - 100. For each test ex- 
ample, we employ an individual Gaussian observation matrix. 
The sparsity levels are selected as = 30 and K - 25 for the 
Gaussian and uniform sparse signals, respectively. K„,ax is 55 
as in the first set of simulations. Fig. |5] depicts the recovery er- 
ror for noisy Gaussian and uniform sparse signals, while the run 
times are compared in Fig.|6l Note that we express the recov- 
ery error in the decibel (dB) scale, calling it the distortion ratio, 
in order to make it better comparable with SNR. Clearly, FBP 
yields the most accurate recovery for both distributions, while 
BP can do slightly better than FBP only when SNR is 5 dB. 
The run times reveal that FBP is not only the most accurate al- 
gorithm in this example, but is also as fast as OMP with ff = 20 
and j6 = 17. Note that, increasing fi beyond 17 would improve 



FBP recovery, while the algorithm would require a longer run 
time. 

4.4. Demonstration on a Sparse Image 

In order to evaluate FBP recovery performance in a more 
realistic case, we demonstrate recovery of the 512x512 image 
"bridge". The recovery is performed using 8x8 blocks. The 
aim for such processing is breaking the recovery problem into 
a number of smaller, and hence simpler, problems. The image 
"bridge" is first preprocessed such that each 8x8 block is K- 
sparse in the 2D Haar Wavelet basis, where K - 12, i.e. for 
each block only the K - \2 largest magnitude wavelet coeffi- 
cients are kept. Note that, in this case the signal is not itself 
sparse, but has a sparse representation in a basis T. Hence, the 
reconstruction dictionary becomes the holographic basis V - 
0*P. From each block, M = 32 observations are taken, where 
the entries of O are randomly drawn from the Gaussian distribu- 
tion with mean and standard deviation 1 jN. The parameters 
are selected as K„ax - 20 and e - 9x10"^. Two set of FBP 
parameters are tested, a - \Q,fi-l and a - 10, j6 = 9. 

Fig.Qshows the preprocessed image "bridge" on the upper 
left. On the upper right is the BP recovery. FBP recovery with 
a - 10, fi - 1 can be found on lower left, and FBP recovery 
with a - 10, fi - 9 is next to it. In this example, BP pro- 
vides a Peak Signal-to-Noise Ratio (PSNR) value of 29.9 dB, 
while the much simpler FBP improves the recovery PSNR up to 
32.5 dB. A careful investigation of the recovered images shows 
that FBP is able to improve the recovery at detailed regions and 
boundaries. This example demonstrates that the simpler FBP 
algorithm is able to perform more accurate and faster recovery 
of a signal with realistic nonzero coefficient distribution than 
the much more sophisticated ii norm minimization approach. 

5. Summary 

This manuscript proposes the forward-backward pursuit al- 
gorithm for CS recovery of sparse signals. FBP, which incor- 
porates iterative forward and backward steps, falls into the cat- 
egory of TST algorithms jT^]. The forward step enlarges the 
support estimate by a atoms, while the backward step removes 
fi < a atoms from it. Hence, this two-stage scheme iteratively 
expands the support estimate for the sparse signal, without re- 
quiring K a priori, as SP or CoSaMP do. In comparison to MP 
type algorithms, FBP provides a backward step for removing 
possibly misidentified atoms from the solution at each iteration. 

FBP recovery performance is demonstrated in Section |5] in 
comparison to OMP, SP and BP algorithms. The simulations 
contain recovery of sparse signals with different distributions of 
the nonzero elements, recovery from noisy observations in ad- 
dition to the demonstration of the algorithm on a sparse image. 
The results indicate that except for sparse signals with constant 
amplitude (the CARS ensemble), FBP can provide better exact 
recovery rates than OMP, BP and SP. We observe that a - 0.3M 
and jS = cf - 1 lead to near-optimum FBP performance, while 
these can be decreased in order to speed up the algorithm with 
a slight sacrifice of the recovery performance. In case of CARS 
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Figure 5: Average recovery distortion over SNR in case of noise contaminated observations. For FBP, a = 2Q and /? = 17. A" = 30 and K = 25 for tfie Gaussian and 
uniform sparse signals, respectively. 
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Figure 6: Average run time per test sample in case of noise contaminated observations. For FBP, ff = 20 and /J = 17. A" = 30 and K = 25 for the Gaussian and 
uniform sparse signals, respectively. 



sparse signals, where the {q and £i norms are equal, BP turns 
out to be the most accurate algorithm, outperforming greedy 
pursuits. Noisy recovery examples state that FBP is robust to 
observation noise, and provides more accurate recovery under 
noise than OMP, BP and SP for Gaussian and uniform sparse 
signals. Finally, demonstration of FBP on a sparse image indi- 
cates recovery abilities of the algorithm for sparse signals with 
realistic nonzero element distributions. We observe that FBP 
recovery of this sparse image is better than the BP recovery. 

Finally, in order to avoid any misinterpretation, we would 
Uke to note that our findings do not contradict with the results 
of Maleki and Donoho in 11611 . especially in terms of OMP per- 
formance. First, we run OMP until the residual power is small 
enough, i.e. ||r||2 < ellylb, by which OMP performance is sig- 
nificantly improved over going exactly for K iterations as in 
lfl6ll . Second, we run the simulations for different nonzero ele- 
ment distributions than in 1 16]. The results in llT6ll indicate that 



SP is the optimal TST scheme for the CARS ensemble. Our re- 
sults for the CARS case are parallel to this: SP performs better 
than FBP and OMP in the CARS scenario, where BP is the best 
performer. However, lll6ll does not contain adequate analysis 
for other types of distributions. Our results show that FBP pro- 
vides better recovery than SP and BP when the magnitudes of 
nonzero elements are not comparable. We observe that the FBP 
performance gets even better when the magnitudes of nonzero 
elements start spanning a wider range, as for the Gaussian dis- 
tribution. Even OMP recovery turns out to be better than SP in 
simulations with Gaussian sparse signals. These indicate that 
SP is not necessarily the optimum TST scheme for all nonzero 
element distributions. From a global perspective, CARS sparse 
signals are the most difficult case for greedy algorithms, and 
hence the corresponding recovery results can be taken as the 
worst case performance for greedy algorithms. Therefore, as 
pronounced in 116,1 . SP has the best worst-case performance 
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Figure 7: Recovery of the image "bridge" using BP and FBP. BP recovery yields 29.9 dB PSNR, while FBP provides 31.5 dB PSNR for a = 10, /? = 7 and 32.5 dB 
PSNRforffi= l0,/3 = 9. 



among TST schemes, while it is outperformed by FBP when 
the nonzero elements do not have comparable amplitudes any- 
more. In addition, we think that only a small portion of the 
real world problems can be represented with constant ampli- 



tude sparse signals. Moreover, for most of the problems, it is 
already known if the signals of interest have comparable mag- 
nitudes or not. Hence, the authors of this paper believe that 
the performance of algorithms for other distributions should be 
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given more credit. Consequently, FBP is a promising algorithm 
for signal recovery from compressed measurements. 
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